fix(vision): scope the no-vision capability error to the latest user image by Nillth · Pull Request #8180 · zeroclaw-labs/zeroclaw

Nillth · 2026-06-22T16:33:31Z

Summary

Base branch: master
What changed and why:
- Sending an image to a model_provider without vision support (and with no
  vision_model_provider configured) raised provider_capability_error capability=vision
  AND left the [IMAGE:] marker in the long-lived session history. Because the error was
  keyed off a history-wide marker count, every later turn (even plain text) re-counted the
  stale marker and re-failed forever. The RPC/streaming path makes it permanent: it persists
  the inbound user message into the session history before the turn loop runs, so a failed
  image turn leaves its marker behind. A single image to a non-vision provider made the
  session unusable until restart.
- Scope the capability error to the most recent genuine user message. resolve_vision_provider
  now errors only when the user just sent an image we cannot see; a carried-over marker
  (a prior failed image turn, or a vision -> non-vision model switch mid-session) degrades to
  text-only (markers stripped, surrounding text preserved) so the conversation continues.
- New count_latest_user_image_markers() in zeroclaw-providers, the turn-scoped counterpart
  to count_user_image_markers(), reusing the same genuine-user-message predicate.
- The user is still told once, on the turn they send the image; subsequent text turns recover.
Scope boundary: No change when a vision_model_provider is configured, no change to
tool-result image degradation (already degraded), no change to the channel orchestrator path
(which never persisted failed-turn images). No config / CLI / API / env surface change.
Blast radius: resolve_vision_provider is the single shared chokepoint inside
run_tool_call_loop (one call site, turn/mod.rs), so the behaviour change reaches every
transport (RPC/streaming, non-streaming, channels). The new function is purely additive.
Linked issue(s): None found (no existing issue).
Labels: bug applied. risk: and size: are auto-applied by the repo labeler
(the path scope labels agent/provider/runtime are already on); the auto-risk rule
classifies runtime-path changes as higher risk, so risk/size are left to the automation.

Validation Evidence (required)

Toolchain pinned to CI's 1.93.0.

cargo fmt --all -- --check                                   # clean
cargo clippy --all-targets -- -D warnings                    # clean (root; validate.sh)
cargo clippy -p zeroclaw-providers -p zeroclaw-runtime -p zeroclaw-channels --all-targets -- -D warnings  # exit 0
cargo test -p zeroclaw-providers -p zeroclaw-runtime
cargo test -p zeroclaw-channels vision

Tail output:

providers: test result: ok. 1022 passed; 0 failed
runtime:   2343 passed; 1 failed; 2 ignored
  count_latest_user_image_markers_scopes_to_newest_user_message ... ok
  run_tool_call_loop_degrades_carried_over_image_on_non_vision_provider ... ok
  run_tool_call_loop_returns_structured_error_for_non_vision_provider ... ok   (preserved)
channels:  test result: ok. 6 passed; 0 failed
  e2e_failed_vision_turn_does_not_poison_follow_up_text_turn ... ok
  e2e_photo_attachment_rejected_by_non_vision_provider ... ok

Beyond CI - what I manually verified: Traced the RPC poison path (the streaming path
persists the user message to the long-lived session history before the loop). Confirmed the
single call site of resolve_vision_provider. Reproduced the fix end-to-end through the
shared engine (carried-over image -> stripped to [media attachment], plain-text turn
succeeds). Confirmed the first-turn capability error is preserved.
The one runtime test failure is unrelated:
cron::store::tests::remove_job_emits_structured_cron_delete_event is a pre-existing
test-isolation flake (a UUID assert against a process-global log broadcast; a sibling
remove_job caller races its event in under in-process parallelism). The diff touches no
cron code, the test passes in isolation, and CI's nextest (process-per-test) isolates it.
Intentionally skipped: the exact --features ci-all clippy combo (needs glib-2.0 /
libudev system libs unavailable on this host). The change touches none of the voice/desktop
features ci-all adds; deferred to CI. A docs-coverage heuristic WARN fired on the new
pub fn, but it is an internal cross-crate helper, not a user-facing surface, so no docs are
needed.

Security & Privacy Impact (required)

New permissions, capabilities, or file system access scope? No
New external network calls? No
Secrets / tokens / credentials handling changed? No
PII, real identities, or personal data in diff, tests, fixtures, or docs? No
The degrade path only strips media-marker references from text sent to a text-only provider;
the image bytes never reach the model in either branch, and no trust boundary or policy check
is affected.

Compatibility (required)

Backward compatible? Yes
Config / env / CLI surface changed? No
Upgrade steps: none. Behaviour change is strictly a bug fix: a previously-poisoned session
now recovers on the next turn instead of failing permanently.

Rollback (required for `risk: medium` and `risk: high`)

Fast rollback command/path: git revert <sha> (single, self-contained commit; the new
function is additive, so reverting cleanly restores the prior history-wide-count behaviour).
Feature flags or config toggles: None. (Operators wanting the prior routing can configure
a vision_model_provider, which is unaffected by this change.)
Observable failure symptoms: grep logs for provider_capability_error with
capability=vision, or the degrade WARN no vision route for carried-over/tool-result image marker(s); degrading to text-only. A regression would show the capability error re-firing on
plain-text turns after an image, or an image the user just sent being silently dropped.

…image Sending an image to a model_provider without vision support (and with no vision_model_provider configured) raised a provider_capability_error AND left the [IMAGE:] marker in the long-lived session history. The capability error was triggered by a history-wide marker count, so every later turn, even plain text, re-counted the stale marker and re-failed forever. The RPC/streaming path makes this permanent: it persists the user message into the session history before the loop runs, so a failed image turn leaves its marker behind. A single image to a non-vision provider made the session unusable until restart. Scope the capability error to the most recent genuine user message: - providers/multimodal: add count_latest_user_image_markers(), the turn-scoped counterpart to count_user_image_markers (skips tool-result carriers and older user messages). - runtime/turn/vision_route: error only when the latest user message carries an image (the user just sent something we cannot see); a carried-over marker, from an earlier failed turn or a vision to non-vision model switch, degrades to text-only (markers stripped) so the turn continues. This also covers the model-switch case. The user is still told once, on the turn they send the image; subsequent text turns recover instead of re-failing. Tests: - providers: count_latest_user_image_markers_scopes_to_newest_user_message - runtime: run_tool_call_loop_degrades_carried_over_image_on_non_vision_provider (end-to-end through the shared engine; asserts the carried-over marker is stripped and the plain-text turn succeeds)

singlerider

First review (no prior reviews/comments). Verified at head a9084c42 (CI green, MERGEABLE). Read the source; did not run local Cargo. All 🟢; approving. This fixes a real session-bricking bug at the right chokepoint.

🟢 The latest-vs-carried-over distinction is the correct fix

resolve_vision_provider (vision_route.rs:12) now branches on count_latest_user_image_markers (the newest genuine user message) rather than the history-wide count. The three arms are exactly right:

User just sent an unviewable image (latest_user_image_marker_count > 0, no vision_model_provider): surface ProviderCapabilityError so the attachment is not silently dropped.
Only carried-over/tool-result markers remain (else): degrade_strip_images = true, WARN once, continue text-only. This is what stops a single failed image turn from re-failing every later plain-text turn forever, which is the reported session-until-restart bug.
vision_model_provider configured: route to it; misconfigured non-vision route surfaces loudly.

🟢 No SSOT violation and the degrade path is non-destructive

count_latest_user_image_markers (multimodal.rs:293) reuses the same is_prompt_tool_result_message genuine-user predicate as count_user_image_markers; it is the turn-scoped counterpart, not a second source of truth. The degrade flag is threaded from resolve_vision_provider (turn/mod.rs:412) into prepare_messages_for_iteration (:439), which strips markers via strip_media_markers on a COPY of the outbound messages (vision_route.rs:138), so the long-lived session history is never mutated and the surrounding caption/metadata text survives. No filesystem path or data URI reaches the text-only provider.

🟢 Single chokepoint, attributed logging, honest disclosure

resolve_vision_provider is the one call site inside run_tool_call_loop (turn/mod.rs), so the behaviour change reaches every transport (RPC/streaming, non-streaming, channels) uniformly. The degrade WARN carries category/outcome/attrs (not a bare record!). Tests pin all three arms (count_latest_user_image_markers_scopes_to_newest_user_message, run_tool_call_loop_degrades_carried_over_image_on_non_vision_provider, and the preserved run_tool_call_loop_returns_structured_error_for_non_vision_provider). The disclosed "1 failed" runtime test is the pre-existing cron::store test-isolation flake (process-global log broadcast race, passes in isolation, isolated by CI nextest), unrelated to this diff which touches no cron code; official CI is green.

Additive, risk:high handled correctly, scope tight (3 files). Approving.

…image (zeroclaw-labs#8180) Sending an image to a model_provider without vision support (and with no vision_model_provider configured) raised a provider_capability_error AND left the [IMAGE:] marker in the long-lived session history. The capability error was triggered by a history-wide marker count, so every later turn, even plain text, re-counted the stale marker and re-failed forever. The RPC/streaming path makes this permanent: it persists the user message into the session history before the loop runs, so a failed image turn leaves its marker behind. A single image to a non-vision provider made the session unusable until restart. Scope the capability error to the most recent genuine user message: - providers/multimodal: add count_latest_user_image_markers(), the turn-scoped counterpart to count_user_image_markers (skips tool-result carriers and older user messages). - runtime/turn/vision_route: error only when the latest user message carries an image (the user just sent something we cannot see); a carried-over marker, from an earlier failed turn or a vision to non-vision model switch, degrades to text-only (markers stripped) so the turn continues. This also covers the model-switch case. The user is still told once, on the turn they send the image; subsequent text turns recover instead of re-failing. Tests: - providers: count_latest_user_image_markers_scopes_to_newest_user_message - runtime: run_tool_call_loop_degrades_carried_over_image_on_non_vision_provider (end-to-end through the shared engine; asserts the carried-over marker is stripped and the plain-text turn succeeds)

Nillth requested review from Audacity88 and singlerider as code owners June 22, 2026 16:33

github-actions Bot added agent Auto scope: src/agent/** changed. provider Auto scope: src/providers/** changed. runtime Auto scope: src/runtime/** changed. labels Jun 22, 2026

Nillth added the bug Something isn't working label Jun 22, 2026

Audacity88 added risk: high Auto risk: security/runtime/gateway/tools/workflows. size: M Auto size: 251-500 non-doc changed lines. labels Jun 22, 2026

Audacity88 added this to the v0.8.3 milestone Jun 22, 2026

Audacity88 mentioned this pull request Jun 22, 2026

[Tracker]: v0.8.3 runtime, agent, tools, and execution stability #8071

Open

65 tasks

singlerider approved these changes Jun 23, 2026

View reviewed changes

Nillth mentioned this pull request Jun 23, 2026

fix(loop): gate path-listing tool results from vision routing #7345

Merged

Nillth merged commit 451b15e into zeroclaw-labs:master Jun 23, 2026
19 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(vision): scope the no-vision capability error to the latest user image#8180

fix(vision): scope the no-vision capability error to the latest user image#8180
Nillth merged 1 commit into
zeroclaw-labs:masterfrom
NNet-Dev:fix/vision-marker-history-poison

Nillth commented Jun 22, 2026 •

edited

Loading

Uh oh!

singlerider left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Nillth commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation Evidence (required)

Security & Privacy Impact (required)

Compatibility (required)

Rollback (required for risk: medium and risk: high)

Uh oh!

singlerider left a comment

Choose a reason for hiding this comment

🟢 The latest-vs-carried-over distinction is the correct fix

🟢 No SSOT violation and the degrade path is non-destructive

🟢 Single chokepoint, attributed logging, honest disclosure

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Nillth commented Jun 22, 2026 •

edited

Loading

Rollback (required for `risk: medium` and `risk: high`)